10 research outputs found

    Unsupervised generation of parallel treebanks through sub-tree alignment

    Get PDF
    The need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. This is true especially for parallel treebanks, of which very few exist. The ones that exist are mainly hand-crafted and too small for reliable use in data-oriented applications. In this paper we introduce an open-source system for fast and robust automatic generation of parallel treebanks. We expect the opening of the presented platform to the scientific community to help boost research in the field of data-oriented machine translation and lead to advancements in other fields where parallel treebanks can be employed

    Highlighting matched and mismatched segments in translation memory output through sub-­tree alignment

    Get PDF
    In recent years, it is becoming more and more clear that the localisation industry does not have the necessary manpower to satisfy the increasing demand for high-quality translation. This has fuelled the search new and existing technologies that would increase translator throughput. As Translation Memory (TM) systems are the most commonly employed tool by translators, a number of enhancements are available to assist them in their job. One such enhancement would be to show the translator which parts of the sentence that needs to be translated match which parts of the fuzzy match suggested by the TM. For this information to be used, however, the translators have to carry it over to the TM translation themselves. In this paper, we present a novel methodology that can automatically detect and highlight the segments that need to be modified in a TM-­suggested translation. We base it on state-­of-the-art sub-­tree align- ment technology (Zhechev,2010) that can produce aligned phrase-­based-­tree pairs from unannotated data. Our system operates in a three-­step process. First, the fuzzy match selected by the TM and its translation are aligned. This lets us know which segments of the source-­language sentence correspond to which segments in its translation. In the second step, the fuzzy match is aligned to the input sentence that is currently being translated. This tells us which parts of the input sentence are available in the fuzzy match and which still need to be translated. In the third step, the fuzzy match is used as an intermediary, through which the alignments between the input sentence and the TM translation are established. In this way, we can detect with precision the segments in the suggested translation that the translator needs to edit and highlight them appropriately to set them apart from the segments that are already good translations for parts of the input sentence. Additionally, we can show the alignments—as detected by our system—between the input and the translation, which will make it even easier for the translator to post-edit the TM suggestion. This alignment information can additionally be used to pre- translate the mismatched segments, further reducing the post-­editing load

    Automatic generation of parallel treebanks: an efficient unsupervised system

    Get PDF
    The need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. This is true especially for parallel treebanks, of which very few exist. The ones that exist are mainly hand-crafted and too small for reliable use in data-oriented applications. In this work I introduce a novel open-source platform for the fast and robust automatic generation of parallel treebanks through sub-tree alignment, using a limited amount of external resources. The intrinsic and extrinsic evaluations that I undertook demonstrate that my system is a feasible alternative to the manual annotation of parallel treebanks. Therefore, I expect the presented platform to help boost research in the field of syntaxaugmented machine translation and lead to advancements in other fields where parallel treebanks can be employed

    Seeding statistical machine translation with translation memory output through tree-based structural alignment

    Get PDF
    With the steadily increasing demand for high-quality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, with the current focus on the use of high-quality Statistical Machine Translation (SMT) as a supplement to the established Translation Memory (TM) technology. In this paper we present a novel modular approach that utilises state-of-the-art sub-tree alignment to pick out pre-translated segments from a TM match and seed with them an SMT system to produce a final translation. We show that the presented system can outperform pure SMT when a good TM match is found. It can also be used in a Computer-Aided Translation (CAT) environment to present almost perfect translations to the human user with markup highlighting the segments of the translation that need to be checked manually for correctness

    Capturing translational divergences with a statistical tree-to-tree aligner

    Get PDF
    Parallel treebanks, which comprise paired source-target parse trees aligned at sub-sentential level, could be useful for many applications, particularly data-driven machine translation. In this paper, we focus on how translational divergences are captured within a parallel treebank using a fully automatic statistical tree-to-tree aligner. We observe that while the algorithm performs well at the phrase level, performance on lexical-level alignments is compromised by an inappropriate bias towards coverage rather than precision. This preference for high precision rather than broad coverage in terms of expressing translational divergences through tree-alignment stands in direct opposition to the situation for SMT word-alignment models. We suggest that this has implications not only for tree-alignment itself but also for the broader area of induction of syntaxaware models for SMT

    Maximising TM performance through sub-tree alignment and SMT

    Get PDF
    With the steadily increasing demand for high quality translation, the localisation industry is constantly searching for technologies that would increase translator throughput, in particular focusing on the use of high-quality Statistical Machine Translation (SMT) supplementing the established Translation Memory (TM) technology. In this paper, we present a novel modular approach that utilises state-of-the-art sub-tree alignment and SMT techniques to turn the fuzzy matches from a TM into near perfect translations. Rather than relegate SMT to a last-resort status where it is only used should the TM system fail to produce the desired output, for us SMT is an integral part of the translation process that we rely on to obtain high-quality results. We show that the presented system consistently produces better quality output than the TM and performs on par or better than the standalone SMT system

    Robust language pair-independent sub-tree alignment

    Get PDF
    Data-driven approaches to machine translation (MT) achieve state-of-the-art results. Many syntax-aware approaches, such as Example-Based MT and Data-Oriented Translation, make use of tree pairs aligned at sub-sentential level. Obtaining sub-sentential alignments manually is time-consuming and error-prone, and requires expert knowledge of both source and target languages. We propose a novel, language pair-independent algorithm which automatically induces alignments between phrase-structure trees. We evaluate the alignments themselves against a manually aligned gold standard, and perform an extrinsic evaluation by using the aligned data to train and test a DOT system. Our results show that translation accuracy is comparable to that of the same translation system trained on manually aligned data, and coverage improves

    You only need a scalar "only"

    No full text
    We propose a compositional analysis for sentences of the kind "You only have to go to the North End to get good cheese", referred to as the Sufficiency Modal Construction in the recent literature. We argue that the SMC is ambiguous depending on the kind of ordering induced by only. So is the exceptive construction – its cross-linguistic counterpart. Only is treated as inducing either a 'comparative possibility' scale or an 'implication-based' partial order on propositions. The properties of the 'comparative possibility' scale explain the absence of the prejacent presupposition that is usually associated with only. By integrating the scalarity into the semantics of the SMC, we explain the polarity facts observed in both variants of the construction. The sufficiency meaning component is argued to be due to a pragmatic inference

    The MOOC as a resource for the acquisition of digital competence in the formation of primary education teachers

    Get PDF
    En el presente estudio se evalúan las potencialidades pedagógicas del recurso tecnológico MOOC (Massive Open Online Course) en la adquisición y desarrollo de la competencia digital del profesorado. Para ello, se ha examinado su valor formativo tras su implementación en el aula de Grado de Primaria a través de un método de investigación cuantitativo. El instrumento de análisis utilizado ha sido el cuestionario de escala Likert. Los datos obtenidos muestran conclusiones significativas sobre el valor positivo de los MOOCs en la consecución de tales competencias así como la conveniencia de su utilización en la formación de los docentes.This study evaluates the pedagogical potential of the MOOC (Massive Open Online Course) technological resource in the acquisition and development of digital competence of teachers. For this purpose, its formative value has been examined after its implementation in the classroom of Primary Grade through a method of quantitative research. The instrument of analysis used was the Likert scale questionnaire. The data obtained show significant conclusions about the positive value of the MOOCs in the achievement of such competences as well as the convenience of their use in the training of teachers
    corecore